Git and Github for Remote Collaboraiton

@javirudolph

Why should I use GitHub?

  • Maintaining code for scientific collaboration as a main objective.
  • Github to store, track changes, and enable collaboration on code. ## Why Github?
    • it is the most used for version control and collaboration.
      • integrates communication features: issues, discussions, pages
      • engage and collaborate on code, but also publish info to a webpage.

What is the difference between Git and Github.

  • Git is the version control system that enables all the collaborative tools available on Github.
  • Git launched in 2005.
    • Basic concepts of git: commit, push, pull, checkout.
  • Git operations through the terminal.
  • Github is a web-based platform also with a desktop environment that allows users to make most repo and data management operations without the command line
    • makes these functionalities available to users less familiar with software development.

Why Github?

Github, GitLab, and BitBucket are all similar and they provide hosting services, which is basically a home for your project on the internet.

It’s like having a DropBox or GoogleDrive but for git-based projects.

This allows other people to see your stuff, synchronize it, and contribute.

Some Github features

  • Well-designed user interface
  • Issues originally a bug tracker but highly underutilized in our fields
  • R and Github integration is nicer due to the active R package development community.

An intro on this can be found here

Step-by-step process:

Artwork by @allison_horst

  1. Create remote repo and sync with files and directory locally.
  2. Modify files locally or remotely
  3. Frequently ‘commit’ the changes with a description of the changes
  4. Synchronize commits with Github (push and pull)

The repository contains: the files, the modifications and the description of these changes. Others can download and synchronize these files with their changes (‘clone’)

Practical ways to use:

  1. Storage

Important: What NOT to track: just because you can version control something, doesn’t mean that you should. Best to reserve Git for plain text-based documents. Git stores the original file first, and then takes up very little space by only tracking the differences between versions. Things not to version control are large data files that never change (like excel and word) and output of the code. If code is fully reproducible, you shouldn’t need to store the output.

1.a. better ‘storage’ for long term - zenodo or the like

  1. Project continuity. So many researchers hold limited-term appointments. Keeping docs on personal computers only does’t work for file transfer when people move on. Easier code and data handover

  2. Project management. Highly collaborative research. Github Issues for discrete taks and sub-tasks: identify, assign, categorize, keep track/history. Github Discussions: message board for conversation Github Projects: real0time tracking of project priorities and status

Example of paper with good tracking of issues and discussions: https://github.com/fmsabatini/sPlotOpen_Manuscript

How to work with Git and Github?

For the R user, best simple straightforward resource out there is Happy git with R

Github itself has a dedicated section for learning in the docs and in particular, the Hello World tutorial will get you creating a repo, managing a branch and merging a pull request.

Branches and pull requests

  1. Create Branch to make a change.
  2. Commit changes to the new branch.
  3. Open Pull request to merge the changes to main branch.
  4. Optional and recommended: delete branch

Also known as a feature branch workflow

source https://www.nobledesktop.com/learn/git/git-branches